Add a script to plot multi-run experiment results #122

lingzhq · 2025-07-14T13:07:03Z

Description

This PR adds a standalone utility to address the need for comparing Trinity's RFT experiments, which require multiple runs due to their stochastic nature. The script parses TensorBoard logs from repeated experiments, aggregates the results, and plots them with confidence intervals.

Here is a sample plot generated by the script, showing the evaluation performance on the MATH500 benchmark for Qwen2.5-1.5B that utilize GRPO on the GSM8K and MATH datasets respectively:

Note: The script requires matplotlib package.

Example Usage:

In current version, this script functions as a standalone utility. Users need to manually specify the paths and configurations for each experiment in a YAML file. To generate the plots, run the following command:

python scripts/multi_exps_plot/multi_exps_plot.py --config scripts/multi_exps_plot/plot_configs.yaml

[TODO] Automate the process of running repeated experiments and generating comparison plots.

Checklist

Please check the following items before code is ready to be reviewed.

Code has passed all tests
Docstrings have been added/updated in Google Style
Documentation has been updated
Code is ready for review

scripts/multi_exps_plot/multi_exps_plot.py

lingzhq and others added 2 commits July 14, 2025 17:18

Add a script for plotting multi-experiment results

111cd36

Merge branch 'modelscope:main' into feat/multi_plot

fa74f70

lingzhq requested a review from yxdyc July 14, 2025 13:07

pan-x-c reviewed Jul 15, 2025

View reviewed changes

scripts/multi_exps_plot/multi_exps_plot.py Show resolved Hide resolved

Add readme

a869faa

pan-x-c approved these changes Jul 15, 2025

View reviewed changes

pan-x-c merged commit 675ff5b into agentscope-ai:main Jul 15, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add a script to plot multi-run experiment results #122

Add a script to plot multi-run experiment results #122

Uh oh!

lingzhq commented Jul 14, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add a script to plot multi-run experiment results #122

Add a script to plot multi-run experiment results #122

Uh oh!

Conversation

lingzhq commented Jul 14, 2025

Description

Example Usage:

Checklist

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants